第 11 屆 iThome 鐵人賽

DAY 27

自我挑戰組

練習程式系列第 27 篇

機器學習與 Python整理

11th鐵人賽

tedtedtedtedted

2019-10-11 00:12:28

4375 瀏覽

分享至

想到Python就想到機器學習，想到機器學習就想到Python。

那就先從python開始安裝:

參考:
Jupyter Notebook介紹

Jupyter的每個框框的最後一列都會直接顯示結果，非常方便練習程式:

cmd命令列:

cmd 環境變數設定方法詳細解釋
檢查python路徑有沒有在Path裡:

set

[Python]查看Python版本

python --version

python -V

安裝pip

Python: 安裝 pip 於 Windows 或 CentOS & 線上/離線安裝 (pip 101)

安裝python筆記

1 確認環境變數有沒有 python

cmd 輸入:

echo %path% 或是 set 或是  python -V

echo %path%(只看path環境變數) 或 set(看所有的環境變數) 要看到有這兩個:

C:\Users\ted5\AppData\Local\Programs\Python\Python37\Scripts\;
C:\Users\ted5\AppData\Local\Programs\Python\Python37\;
或是
C:\Program Files\Python38\Scripts\
C:\Program Files\Python38\
或是
C:\Program Files\Python37\Scripts\
C:\Program Files\Python37\
或是
C:\Program Files (x86)\Python37-32\Scripts\
C:\Program Files (x86)\Python37-32\

沒有的話，就右鍵電腦內容-->設定-->進階系統-->環境變數。
看要設定使用者的變數或是系統的變數(全部的使用者都可以用?) 都可以
備註 : windows7---->用;隔開路徑，最後一個路徑不用加;

2 安裝 pip

How to Install Python PIP on Windows 8 / Windows 10
載好get-pip.py，然後cmd執行

python get-pip.py

確認有沒有安裝pip:

pip -V

pip執行檔在:

C:\Users\ted5\AppData\Local\Programs\Python\Python37\Scripts

3 在確認這些套件有沒有安裝

pip install selenium 
pip install BeautifulSoup4
pip install requests

重覆裝的話要有 Requirement already satisfied，然後這些第三方套件的檔案在

C:\Users\ted5\AppData\Local\Programs\Python\Python37\Lib\site-packages

4

最後執行py檔，看能不能跑，和錯誤訊息

5 exe檔

最後再執行exe檔(exe檔不用安裝python，就可以執行了)

exe檔怎麼製作:
先 pip install pyinstaller ，安裝 pyinstaller
移動到.py目錄--> pyinstaller -F ted5.py
然後exe檔在dist資料夾裡

在用db2時，會有錯誤:
ImportError: DLL load failed: The specified module could not be found — IBM DB2

解法:

Pyinstaller -F --add-binary C:\Users\ted5\AppData\Local\Programs\Python\Python37\Lib\site-packages\ibm_db_dlls\ibm_db.dll;.\ibm_db_dlls ^
ted5.py

其他db2問題:
一
code to insert values into db2 table in python

二
How to connect Python to Db2
安裝Db2:

pip install ibm_db

三字串長度超過資料庫字串:
String data, right truncation” call BCP

載一堆套件練習程式:

requests :

Python 使用 requests 模組產生 HTTP 請求，下載網頁資料教學
 Day-1 Python爬蟲小人生(1)
Python爬蟲新手筆記
 Website login using requests library in Python
Python Requests Tutorial: Request Web Pages, Download Images, POST Data, Read JSON, and More

Beautiful Soup :

Python 使用 Beautiful Soup 抓取與解析網頁資料，開發網路爬蟲教學

遇到問題:
“Unicode Error ”unicodeescape" codec can't decode bytes… Cannot open text files in Python 3
How do I fix this cp950 “illegal multibyte sequence” UnicodeDecodeError when reading a text file?

練習程式:

import requests
# 引入 Beautiful Soup 模組
from bs4 import BeautifulSoup
r = requests.get("https://www.google.com/search?q=google&oq=google&aqs=chrome..69i57j0l7.1303j0j7&sourceid=chrome&ie=UTF-8")
soup = BeautifulSoup(r.text,"html.parser")
# print(soup) 
u = soup.find(attrs={"aria-label" :"下一頁"})   #下一頁按鈕aria-label ="下一頁"
print(u)
print(u.href)
url = "https://www.google.com"+ str(u.get('href'))  #組合出網址
print(url)

可是直接requests的網址跟瀏覽器看到的不一樣，所以要用:
在Windows上安裝Python & Selenium + 簡易教學

練習程式:

# 引入 Beautiful Soup 模組
from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
# 輸入網址，交給瀏覽器
driver.get('https://www.google.com/search?q=google&oq=google&aqs=chrome..69i57j0l7.1303j0j7&sourceid=chrome&ie=UTF-8')
# 取得網頁原始碼
pageSource = driver.page_source  
# print(pageSource)

soup = BeautifulSoup(pageSource,"html.parser")
#print(soup)
u = soup.find(attrs={"aria-label" :"Page 2"})#第二頁按鈕aria-label ='Page 2'
print(u)
print(u.href)
url = "https://www.google.com"+ str(u.get('href'))  #組合出網址
print(url)

driver.close()  # 關閉瀏覽器

接著練習Urllib套件，會有這個問題:

Cannot import urllib in Python
因為在python3，Urllib是標準套件了，不用安裝。
練習程式:
Python Urllib Module

import urllib.request 
request_url = urllib.request.urlopen('https://www.google.com/search?q=google&oq=google&aqs=chrome..69i57j0l7.1303j0j7&sourceid=chrome&ie=UTF-8') 
print(request_url.read())

會有這個錯誤:

urllib.error.HTTPError: HTTP Error 403: Forbidden

所以先不用Urllib

request cookie:

How to add a cookie to the cookiejar in python requests library

import requests
# 含有 cookie 的內容
r = requests.get("https://www.google.com/search?q=google&oq=google&aqs=chrome.0.69i59l3j0l4j69i60.1431j0j7&sourceid=chrome&ie=UTF-8")

# 取出 cookie
print(r.cookies)
print(r.cookies['1P_JAR'])
print(r.cookies['NID'])
#print(r.cookies['CGIC']) #會有錯誤 #requests.cookies.CookieConflictError: There are multiple cookies '
#查看類別
print(type(r.cookies))

Selenium:

Selenium 3 Python Chrome Driver
Python Selenium練習篇之1-處理Alert彈窗整理
 4. 查找元素
 Python selenium —— 操作select标签的下拉选择框
 Check if any alert exists using selenium with python
Selenium Framework for Beginners 32 | How to use Headless Chrome with Selenium
How to configure ChromeDriver to initiate Chrome browser in Headless mode through Selenium?
WebDriver API
Change user agent for selenium driver
how to use while loop to repeat task in selenium
Refreshing web page by WebDriver when waiting for specific condition
Python Selenium Get HTML Page Source
Selenium Compound class names not permitted
Selenium xpath no such element exception even though it works in firepath
遇到問題:
一

UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 1-1: Non-BMP character not supported in Tk

目前解決方法:
'UCS-2' codec can't encode characters in position 1050-1050
目前先採用回覆1的方式，還是會有些不能顯示

回覆2的方式encode('unicode-escape').decode('utf-8')
會變成\u0000\uaab0\U0004444h 之類的

回覆3 ，更新成python3.8 ，用成python3.8確實就不會有這個錯誤了，而且特殊符號還能顯示(在excel和python都可以顯示特殊符號)

二
Can not click on a Element: ElementClickInterceptedException in Splinter / Selenium

採用:

ActionChains(driver).move_to_element(element).click(element).perform()

三
scroll a web page:
How can I scroll a web page using selenium webdriver in python?
但是滑動元素，不適用scroll a web page 、
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
，會沒什麼用，因為window 本來就最底了(但是元素還沒最底可以滑動，所以要滑動元素)

四:關掉chromedriver.exe:
release Selenium chromedriver.exe from memory
參考解答2

browser.close() will close only the current chrome window.
browser.quit() should close all of the open windows, then exit webdriver.

參考:
Selenium execute_script window.scrollTo not scrolling window
Scrolling to element using webdriver?
採用:

driver.execute_script("arguments[0].scrollIntoView(true);", element)

整理:
1
ip 的 regex:

\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}(?:/\d{2})?
像是:255.255.255.255/14

2
php 的 preg_match($pattern, $input ,$matches,$flags)
的pattern 要用:

/\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}(?:/\d{2})?/
要最前面和最後面增加--> /

$matches 是一個陣列，如果有$flags參數，就會是二維陣列[][]
像是假如 $input = "255.255.255.255/14" 和 $flags = PREG_OFFSET_CAPTURE
那$matches[0][0] 會是"255.255.255.255/14"
$matches[0][1] 會是 0 (字串的第一個字母的index)

大概了解機器學習:

參考:ML Lecture 0-1: Introduction of Machine Learning

看完影片後整理的內容，可能會有錯誤:

1 人工智慧(Artificial Intelligence):希望機器跟人一樣有智慧。

2 機器學習(machine Learning):機器學習。就是讓機器會自己學習，如果說人工智慧是目標，那機器學習就是手段。機器學習不是寫一堆的if來判斷輸入而輸出結果的，因為這樣會寫不完的，像是語音辨識就不可能用這個方法寫的完。所以機器學習是找function，像是輸入一段語音，經過找到的function，輸出how are you；輸入一張圖片，經過找到的function，輸出:這張圖片是貓；輸入圍棋的棋譜，經過找到的function，輸出下一步。那要怎麼找到這個function?，那要怎麼讓機器會自己學習?有很多種方法，像是Supervised learning、Reinforcement learning。

3 深度學習(deep learning):機器自學的方式有好幾種，deep learning是其中一種
參考:3分鐘搞懂深度學習到底在深什麼

4 監督式學習(Supervised learning):也是機器學習的一種方法。像是今天有一張圖片是貓，要怎麼經過function，輸出是貓呢?首先需要很多個function，也就是一個很多個function的集合，稱為model。Function有好有壞，有的把貓看成狗，有的把貓看成貓，所以第二步驟需要training data，training data會給輸入和輸出的資料，輸入是貓的圖片，那輸出就是貓，拿這些training data去判斷model裡哪個function是正確的、哪個是錯誤的，最後這些正確的function還要經過演算法選一個最準確地判斷圖片是貓的。這個過程稱為training。最後在測試function的準確性，拿一張貓的圖片，經過這個最準確的function，輸出就會是貓了，這個過程稱為testing。

5 Regression:function的輸出是scalar(數字、純量)。像是輸入很多個幾天前的PM2.5數值，經過function，輸出就會是今天的PM2.5數值。

6 Classification:分為兩種，一種是Binary Classification、一種是Muti-class Classification。Binary Classification就是function的輸出是yes或no，像是判斷垃圾郵件; Muti-class Classification就是function的輸出是一個class，像是新聞分類。

7 label:funtion的輸出(output)又稱為label。

8 非監督式學習(Unsupervised Learning):監督式學習的training data 會有輸入和輸出，輸入一張貓的圖片，輸出就是貓；但非監督式學習，只會有貓的圖片，機器自己經過function判斷之後輸出的是什麼。

內容可能會有錯誤。

c、c++ 指標、 java call by value 和 call by reference

大數據、機器學習相關名詞

系列文

練習程式共 37 篇

RSS系列文訂閱系列文

10 人訂閱

完整目錄

直播研討會

{{ item.channelVendor }} {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

參賽組數

1064 組

團體組數

40 組

累計文章數

22203 篇

完賽人數

602 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# windows server linux css react vue.js

IT邦幫忙

練習程式系列 第 27 篇

機器學習 與 Python整理

那就先從python開始安裝:

cmd命令列:

安裝pip

安裝python筆記

1 確認環境變數 有沒有 python

2 安裝 pip

3 在確認這些套件有沒有安裝

4

5 exe檔

載一堆套件練習程式:

requests :

Beautiful Soup :

接著練習Urllib套件，會有這個問題:

request cookie:

Selenium:

Python WebServer:

Python 學習:

urllib.parse:

Python 編碼:

Python IO:

檔案讀寫

Python Json:

Pillow:

pytesseract

Python threading

Regex

大概了解機器學習:

尚未有邦友留言

標記使用者

練習程式系列第 27 篇

機器學習與 Python整理

1 確認環境變數有沒有 python